CI: Add behavioral validation tests for topology, problems, and operations#91
CI: Add behavioral validation tests for topology, problems, and operations#91renecannao merged 9 commits intomasterfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 20 minutes and 11 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThree new functional test scripts were added to validate topology discovery, problem detection, and topology operations. The CI workflow was updated to run these tests between smoke and regression stages. The shared test helper Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant Orchestrator as Orchestrator API
participant MySQL1 as mysql1 (master)
participant MySQL2 as mysql2 (replica)
participant MySQL3 as mysql3 (replica)
TestRunner->>Orchestrator: wait for /api/cluster and /api/instance readiness
TestRunner->>Orchestrator: POST /api/discover/mysql1/3306 (discover topology)
Orchestrator->>MySQL1: query instance status, GTID, read_only, uptime
Orchestrator->>MySQL2: query instance status, replication status
Orchestrator->>MySQL3: query instance status, replication status
Orchestrator-->>TestRunner: cluster JSON (instances, master/replica info)
alt Problem detection test
TestRunner->>MySQL2: execute SQL to stop replication
TestRunner->>Orchestrator: POST /api/discover/mysql2/3306
Orchestrator->>MySQL2: detect replication stopped
Orchestrator-->>TestRunner: /api/problems includes mysql2 problem
TestRunner->>MySQL2: restart replication
TestRunner->>Orchestrator: POST /api/discover/mysql2/3306
end
alt Topology operations test
TestRunner->>Orchestrator: POST /api/relocate/mysql3
Orchestrator->>MySQL3: orchestrate move (exec SQL or GTID operations)
Orchestrator-->>TestRunner: operation result
TestRunner->>MySQL3: query @@read_only and replication source to verify
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces three new functional test scripts to validate orchestrator's problem detection, topology discovery, and topology refactoring operations. My review identified a bug in the errant GTID test cleanup where the database drop operation would fail due to super_read_only being enabled, and I have provided a correction. Additionally, I recommended replacing fixed sleep intervals with polling loops for better test reliability, suggested consolidating redundant Python parsing logic, and noted a discrepancy between a comment and the actual implementation regarding timeout thresholds.
| $COMPOSE exec -T mysql3 mysql -uroot -ptestpass -e " | ||
| SET GLOBAL read_only=0; | ||
| DROP DATABASE IF EXISTS errant_detect_test; | ||
| SET GLOBAL read_only=1; | ||
| " 2>/dev/null |
There was a problem hiding this comment.
The cleanup for the errant GTID test will fail silently. After the injection at lines 158-164, super_read_only is set to 1. The cleanup attempts to DROP DATABASE, which is a write operation that will be blocked by super_read_only=1. The error is suppressed by 2>/dev/null.
The cleanup needs to disable super_read_only to be able to drop the database and then restore it.
| $COMPOSE exec -T mysql3 mysql -uroot -ptestpass -e " | |
| SET GLOBAL read_only=0; | |
| DROP DATABASE IF EXISTS errant_detect_test; | |
| SET GLOBAL read_only=1; | |
| " 2>/dev/null | |
| $COMPOSE exec -T mysql3 mysql -uroot -ptestpass -e " | |
| SET GLOBAL read_only=0; | |
| SET GLOBAL super_read_only=0; | |
| DROP DATABASE IF EXISTS errant_detect_test; | |
| SET GLOBAL read_only=1; | |
| SET GLOBAL super_read_only=1; | |
| " 2>/dev/null |
| echo "Setting mysql2 writeable via API..." | ||
| RESULT=$(curl -s --max-time 10 "$ORC_URL/api/set-writeable/mysql2/3306" 2>/dev/null) | ||
|
|
||
| sleep 2 |
There was a problem hiding this comment.
Using a fixed sleep (here and on line 158) can make tests flaky if the operation takes longer than expected. A more robust approach is to poll for the expected read_only status in a loop with a timeout.
For example:
SUCCESS=false
for i in $(seq 1 10); do
# check condition
if [ ... ]; then
SUCCESS=true
break
fi
sleep 1
done
# assert SUCCESS is true| HAS_MYSQL2=$(echo "$PROBLEMS" | python3 -c " | ||
| import json, sys | ||
| problems = json.load(sys.stdin) | ||
| for p in problems: | ||
| h = p.get('Key', {}).get('Hostname', '') | ||
| if 'mysql2' in h: | ||
| sys.exit(0) | ||
| sys.exit(1) | ||
| " 2>/dev/null && echo "yes" || echo "no") |
There was a problem hiding this comment.
| ORC_MASTER_HOST=$(echo "$MASTER_INFO" | python3 -c "import json,sys; print(json.load(sys.stdin)['Key']['Hostname'])" 2>/dev/null || echo "") | ||
| ORC_MASTER_RO=$(echo "$MASTER_INFO" | python3 -c "import json,sys; print(json.load(sys.stdin)['ReadOnly'])" 2>/dev/null || echo "") |
There was a problem hiding this comment.
| echo "" | ||
| echo "--- Test 5: Instance freshness ---" | ||
|
|
||
| # Verify LastChecked is recent (within last 60 seconds) |
There was a problem hiding this comment.
This comment states the check is for the "last 60 seconds", but the implementation at line 193 uses a threshold of 120 seconds. The comment should be updated to match the code to avoid confusion.
| # Verify LastChecked is recent (within last 60 seconds) | |
| # Verify LastChecked is recent (within last 120 seconds) |
| # Use move-up to move mysql3 from under mysql2 back to mysql1 | ||
| echo "Moving mysql3 up via API..." | ||
| RESULT=$(curl -s --max-time 30 "$ORC_URL/api/move-up/mysql3/3306" 2>/dev/null) | ||
| CODE=$(echo "$RESULT" | python3 -c "import json,sys; print(json.load(sys.stdin).get('Code',''))" 2>/dev/null || echo "") |
There was a problem hiding this comment.
This line is identical to line 59 and line 118. To improve maintainability and reduce duplication, consider creating a small helper function to extract the Code field from the API JSON response.
For example:
get_api_code() {
echo "$1" | python3 -c "import json,sys; print(json.load(sys.stdin).get('Code',''))" 2>/dev/null || echo ""
}
# Usage
CODE=$(get_api_code "$RESULT")e81056f to
fa42679
Compare
…89, #90) Add three new test scripts that verify orchestrator actually works correctly, not just that API endpoints return 200: - test-topology-discovery.sh: validates orchestrator's view of the topology matches actual MySQL state (master/replica roles, read_only, GTID tracking, replication threads, instance freshness) - test-problem-detection.sh: verifies orchestrator detects and clears replication problems (stopped replication, writable replica, errant GTID) by inducing problems and checking the API response - test-topology-operations.sh: tests actual topology refactoring operations (relocate, move-up, repoint, set-read-only/writeable) and cross-references results with direct MySQL queries
The test created extra_db on mysql2 but only cleaned up the channel on mysql3. Add DROP DATABASE to prevent residual state.
34f3c7c to
cbda41b
Compare
The \G (vertical format) flag is unreliable when passed through docker exec pipes — it returns empty output on MySQL 8.4 and 9.x. Use tabular SHOW REPLICA STATUS instead; column 2 is always the source host.
awk splits on whitespace by default, picking up "for" from "Waiting for source..." (column 1 = Replica_IO_State). Use -F'\t' to split on tabs so column 2 = Source_Host.
Orchestrator caches instance state and only refreshes on its poll interval. After STOP REPLICA / SET GLOBAL read_only / injecting errant GTIDs, the tests need to force a re-discover call so orchestrator updates its cached data before the test assertions check the API.
- Verify data exists on mysql2 before setting up the extra channel - Replace single sleep+check with a 15s retry loop for data replication - Add better error messages for debugging if setup or replication fails
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/functional/test-topology-discovery.sh (1)
5-5: Add error handling forcdcommand.If the
cdfails (e.g., directory doesn't exist), the script will continue with incorrect working directory, potentially causing confusing failures.Proposed fix
-cd "$(dirname "$0")/../.." +cd "$(dirname "$0")/../.." || { echo "Failed to cd to repository root"; exit 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-topology-discovery.sh` at line 5, The script currently runs cd "$(dirname "$0")/../.." without checking for failure; update the script so the cd command's failure is detected and handled (for example, test the exit status of cd or enable errexit) and exit with a clear error message; specifically modify the invocation around cd "$(dirname "$0")/../.." (or the script's top-of-file bootstrapping) to abort the script and log why (unable to change to the expected working directory) when the cd fails.tests/functional/test-problem-detection.sh (1)
5-5: Add error handling forcdcommand.Consistent with the other test scripts, this should fail fast if the directory change fails.
Proposed fix
-cd "$(dirname "$0")/../.." +cd "$(dirname "$0")/../.." || { echo "Failed to cd to repository root"; exit 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-problem-detection.sh` at line 5, The cd command in tests/functional/test-problem-detection.sh should fail fast like the other scripts: update the cd "$(dirname "$0")/../.." invocation to check its exit status and abort on failure (e.g., append an immediate failure handler such as || exit 1 or an equivalent error message + exit) so the script stops if the directory change fails.tests/functional/test-topology-operations.sh (1)
5-5: Add error handling forcdcommand.Same issue as in other test scripts — the script should fail fast if the directory change fails.
Proposed fix
-cd "$(dirname "$0")/../.." +cd "$(dirname "$0")/../.." || { echo "Failed to cd to repository root"; exit 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-topology-operations.sh` at line 5, The cd command cd "$(dirname "$0")/../.." should fail fast if it cannot change directories; update that line to check the exit status and abort on failure (e.g., append a conditional OR that prints an error to stderr and exits non‑zero), or ensure the script enables strict failure (set -e) before running the cd; target the exact cd invocation (cd "$(dirname "$0")/../..") when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/functional/test-problem-detection.sh`:
- Line 5: The cd command in tests/functional/test-problem-detection.sh should
fail fast like the other scripts: update the cd "$(dirname "$0")/../.."
invocation to check its exit status and abort on failure (e.g., append an
immediate failure handler such as || exit 1 or an equivalent error message +
exit) so the script stops if the directory change fails.
In `@tests/functional/test-topology-discovery.sh`:
- Line 5: The script currently runs cd "$(dirname "$0")/../.." without checking
for failure; update the script so the cd command's failure is detected and
handled (for example, test the exit status of cd or enable errexit) and exit
with a clear error message; specifically modify the invocation around cd
"$(dirname "$0")/../.." (or the script's top-of-file bootstrapping) to abort the
script and log why (unable to change to the expected working directory) when the
cd fails.
In `@tests/functional/test-topology-operations.sh`:
- Line 5: The cd command cd "$(dirname "$0")/../.." should fail fast if it
cannot change directories; update that line to check the exit status and abort
on failure (e.g., append a conditional OR that prints an error to stderr and
exits non‑zero), or ensure the script enables strict failure (set -e) before
running the cd; target the exact cd invocation (cd "$(dirname "$0")/../..") when
applying the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f699f0c4-2392-4ad4-a3b1-fea6df43c768
📒 Files selected for processing (6)
.github/workflows/functional.ymltests/functional/lib.shtests/functional/test-named-channels.shtests/functional/test-problem-detection.shtests/functional/test-topology-discovery.shtests/functional/test-topology-operations.sh
After STOP+START REPLICA, mysql2 may still appear in /api/problems due to other issues (replication lag, errant GTID from stop/start cycle). Instead of checking /api/problems for any mysql2 entry, verify that the SQL thread is running again via the instance API. Also force re-discovery on each poll iteration to ensure fresh state.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
tests/functional/test-named-channels.sh (1)
65-65: Use_for the intentionally unused loop variable.Line 65 uses
ibut never reads it, which triggers SC2034 and adds noise.#!/bin/bash # Verify the loop variable is not referenced anywhere in this script. rg -n --type=sh -C2 '\bfor\s+i\s+in\s+\$\(seq 1 15\)' tests/functional/test-named-channels.sh rg -n --type=sh '\$i\b' tests/functional/test-named-channels.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-named-channels.sh` at line 65, The loop variable `i` in the `for i in $(seq 1 15); do` line is intentionally unused and should be renamed to `_` to avoid the SC2034 unused-variable warning; update the loop to `for _ in $(seq 1 15); do` and verify there are no references to `$i` elsewhere (if any exist, replace or remove them) so the change is safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/functional/test-named-channels.sh`:
- Around line 129-131: The DROP DATABASE cleanup currently suppresses stderr and
ignores failures for the command `$COMPOSE exec -T mysql2 mysql -uroot
-ptestpass -e "DROP DATABASE IF EXISTS extra_db;"`; change this so failures are
detected and cause the test run to fail: remove the `2>/dev/null` suppression,
capture and check the command exit status (or enable `set -e`/`set -o errexit`
for the script), and on non-zero exit print a clear error message and exit
non-zero so stale state cannot leak into later tests.
In `@tests/functional/test-problem-detection.sh`:
- Line 23: The mysql state-changing exec lines (e.g. the call using $COMPOSE
exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL") suppress stderr and
ignore exit codes; update each such mutation call (including analogous
START/INSERT/UPDATE/DELETE commands) to capture stdout/stderr and check the
command's exit status immediately, and if non-zero print a descriptive error
including the captured stderr and exit non-zero to fail the test fast. Locate
the commands by the pattern "$COMPOSE exec -T mysql* mysql -uroot -ptestpass -e
\"...\"" and modify them to redirect/collect stderr (or assign output to a
variable), test $? (or the command result), and abort with an explanatory
message when the command fails.
- Around line 100-105: The current check marks CLEARED when only SQL_RUNNING is
"True", which can pass while IO_RUNNING is "False"; update the conditional that
sets CLEARED (the block using SQL_RUNNING, IO_RUNNING, CLEARED and REPL_STATE)
to require both SQL_RUNNING = "True" and IO_RUNNING = "True" before setting
CLEARED and breaking the loop, and adjust the echo message to reflect both
thread states (e.g., "SQL=True, IO=True") so the test only passes when both
threads are healthy.
- Line 15: discover_topology is being treated as optional but must be a hard
precondition; update the test to fail immediately when discover_topology
"mysql1" returns non-zero by checking its exit status and exiting with a
non-zero code and an explanatory stderr message (e.g., run discover_topology
"mysql1" and if it fails print "Topology discovery failed for mysql1" to stderr
and exit 1), or alternatively enable strict shell error handling at the top of
the script (set -euo pipefail) to enforce failure; refer to the
discover_topology invocation to locate where to add the check.
- Around line 5-6: The script should fail fast if bootstrap steps fail: add a
strict shell flag (e.g., set -euo pipefail or at minimum set -e) at the top of
the script (immediately after the shebang) so that failures in the cd "$(dirname
"$0")/../.." or source tests/functional/lib.sh commands abort the run; ensure
the flags are applied before those two lines so any failure in cd or source
stops the script.
---
Nitpick comments:
In `@tests/functional/test-named-channels.sh`:
- Line 65: The loop variable `i` in the `for i in $(seq 1 15); do` line is
intentionally unused and should be renamed to `_` to avoid the SC2034
unused-variable warning; update the loop to `for _ in $(seq 1 15); do` and
verify there are no references to `$i` elsewhere (if any exist, replace or
remove them) so the change is safe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e9671a70-a5c5-46e7-96f5-90bb0c864147
📒 Files selected for processing (2)
tests/functional/test-named-channels.shtests/functional/test-problem-detection.sh
| # Clean up test database on mysql2 | ||
| $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null | ||
|
|
There was a problem hiding this comment.
Handle cleanup failures explicitly to avoid cross-test contamination.
Line 130 suppresses stderr and does not check command success. If DROP DATABASE fails, stale state can leak into later functional tests in the same environment.
Suggested fix
# Clean up test database on mysql2
-$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null
+if ! $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null; then
+ fail "Cleanup: failed to drop extra_db on mysql2"
+else
+ pass "Cleanup: dropped extra_db on mysql2"
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Clean up test database on mysql2 | |
| $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null | |
| # Clean up test database on mysql2 | |
| if ! $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null; then | |
| echo "Error: Failed to drop extra_db on mysql2" | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-named-channels.sh` around lines 129 - 131, The DROP
DATABASE cleanup currently suppresses stderr and ignores failures for the
command `$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF
EXISTS extra_db;"`; change this so failures are detected and cause the test run
to fail: remove the `2>/dev/null` suppression, capture and check the command
exit status (or enable `set -e`/`set -o errexit` for the script), and on
non-zero exit print a clear error message and exit non-zero so stale state
cannot leak into later tests.
| cd "$(dirname "$0")/../.." | ||
| source tests/functional/lib.sh |
There was a problem hiding this comment.
Fail fast when bootstrap steps fail.
If cd or source fails, the script still continues (no set -e), which can cascade into misleading failures later.
Suggested fix
-cd "$(dirname "$0")/../.."
-source tests/functional/lib.sh
+cd "$(dirname "$0")/../.." || { echo "FATAL: unable to cd to repository root"; exit 1; }
+source tests/functional/lib.sh || { echo "FATAL: unable to load tests/functional/lib.sh"; exit 1; }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd "$(dirname "$0")/../.." | |
| source tests/functional/lib.sh | |
| cd "$(dirname "$0")/../.." || { echo "FATAL: unable to cd to repository root"; exit 1; } | |
| source tests/functional/lib.sh || { echo "FATAL: unable to load tests/functional/lib.sh"; exit 1; } |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 5-5: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[info] 6-6: Not following: tests/functional/lib.sh was not specified as input (see shellcheck -x).
(SC1091)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-problem-detection.sh` around lines 5 - 6, The script
should fail fast if bootstrap steps fail: add a strict shell flag (e.g., set
-euo pipefail or at minimum set -e) at the top of the script (immediately after
the shebang) so that failures in the cd "$(dirname "$0")/../.." or source
tests/functional/lib.sh commands abort the run; ensure the flags are applied
before those two lines so any failure in cd or source stops the script.
| START_SQL=$(mysql_start_replica_sql) | ||
|
|
||
| wait_for_orchestrator || { echo "FATAL: Orchestrator not reachable"; exit 1; } | ||
| discover_topology "mysql1" |
There was a problem hiding this comment.
Treat topology discovery as a hard precondition.
Line 15 ignores discover_topology failure, so tests can run against partial/undiscovered state and produce noisy results.
Suggested fix
-discover_topology "mysql1"
+discover_topology "mysql1" || { echo "FATAL: initial topology discovery failed"; exit 1; }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| discover_topology "mysql1" | |
| discover_topology "mysql1" || { echo "FATAL: initial topology discovery failed"; exit 1; } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-problem-detection.sh` at line 15, discover_topology is
being treated as optional but must be a hard precondition; update the test to
fail immediately when discover_topology "mysql1" returns non-zero by checking
its exit status and exiting with a non-zero code and an explanatory stderr
message (e.g., run discover_topology "mysql1" and if it fails print "Topology
discovery failed for mysql1" to stderr and exit 1), or alternatively enable
strict shell error handling at the top of the script (set -euo pipefail) to
enforce failure; refer to the discover_topology invocation to locate where to
add the check.
|
|
||
| # Stop replication on mysql2 | ||
| echo "Stopping replication on mysql2..." | ||
| $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL" 2>/dev/null |
There was a problem hiding this comment.
Check exit codes for state-changing MySQL commands.
These commands mutate DB state but suppress stderr and do not verify success. If any command fails, subsequent assertions may incorrectly blame orchestrator behavior.
Suggested pattern
+# helper
+run_mysql() {
+ local host="$1"
+ local sql="$2"
+ if ! $COMPOSE exec -T "$host" mysql -uroot -ptestpass -e "$sql" >/dev/null; then
+ fail "MySQL command failed on ${host}" "$sql"
+ return 1
+ fi
+ return 0
+}-$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL" 2>/dev/null
+run_mysql "mysql2" "$STOP_SQL" || exit 1Apply the same pattern to the other mutation calls.
Also applies to: 81-81, 122-122, 150-150, 179-185, 216-220
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-problem-detection.sh` at line 23, The mysql
state-changing exec lines (e.g. the call using $COMPOSE exec -T mysql2 mysql
-uroot -ptestpass -e "$STOP_SQL") suppress stderr and ignore exit codes; update
each such mutation call (including analogous START/INSERT/UPDATE/DELETE
commands) to capture stdout/stderr and check the command's exit status
immediately, and if non-zero print a descriptive error including the captured
stderr and exit non-zero to fail the test fast. Locate the commands by the
pattern "$COMPOSE exec -T mysql* mysql -uroot -ptestpass -e \"...\"" and modify
them to redirect/collect stderr (or assign output to a variable), test $? (or
the command result), and abort with an explanatory message when the command
fails.
| SQL_RUNNING=$(echo "$REPL_STATE" | cut -d: -f1) | ||
| IO_RUNNING=$(echo "$REPL_STATE" | cut -d: -f2) | ||
| if [ "$SQL_RUNNING" = "True" ]; then | ||
| CLEARED=true | ||
| echo "Replication recovered after ${i}s (SQL=True, IO=${IO_RUNNING})" | ||
| break |
There was a problem hiding this comment.
Require both SQL and IO threads for “cleared” state.
Current clear condition only checks SQL=True; replication can still be unhealthy with IO=False, causing a false pass.
Suggested fix
- if [ "$SQL_RUNNING" = "True" ]; then
+ if [ "$SQL_RUNNING" = "True" ] && [ "$IO_RUNNING" = "True" ]; then
CLEARED=true
- echo "Replication recovered after ${i}s (SQL=True, IO=${IO_RUNNING})"
+ echo "Replication recovered after ${i}s (SQL=True, IO=True)"
break
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SQL_RUNNING=$(echo "$REPL_STATE" | cut -d: -f1) | |
| IO_RUNNING=$(echo "$REPL_STATE" | cut -d: -f2) | |
| if [ "$SQL_RUNNING" = "True" ]; then | |
| CLEARED=true | |
| echo "Replication recovered after ${i}s (SQL=True, IO=${IO_RUNNING})" | |
| break | |
| SQL_RUNNING=$(echo "$REPL_STATE" | cut -d: -f1) | |
| IO_RUNNING=$(echo "$REPL_STATE" | cut -d: -f2) | |
| if [ "$SQL_RUNNING" = "True" ] && [ "$IO_RUNNING" = "True" ]; then | |
| CLEARED=true | |
| echo "Replication recovered after ${i}s (SQL=True, IO=True)" | |
| break |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-problem-detection.sh` around lines 100 - 105, The
current check marks CLEARED when only SQL_RUNNING is "True", which can pass
while IO_RUNNING is "False"; update the conditional that sets CLEARED (the block
using SQL_RUNNING, IO_RUNNING, CLEARED and REPL_STATE) to require both
SQL_RUNNING = "True" and IO_RUNNING = "True" before setting CLEARED and breaking
the loop, and adjust the echo message to reflect both thread states (e.g.,
"SQL=True, IO=True") so the test only passes when both threads are healthy.
Summary
The existing functional tests are mostly HTTP status code checks ("does the endpoint return 200?"). This PR adds real behavioral tests that verify orchestrator actually works correctly.
New test scripts
test-topology-discovery.sh (#88) — Cross-references orchestrator's API data against direct MySQL queries:
test-problem-detection.sh (#89) — Induces problems and verifies detection:
test-topology-operations.sh (#90) — Tests actual refactoring operations:
relocate: move mysql3 under mysql2, verify via API and direct MySQLmove-up: move mysql3 back under mysql1repoint: GTID-based repoint to different masterset-read-only/set-writeable: toggle read_only via APIDepends on #85
Closes #88, closes #89, closes #90
Test plan
Summary by CodeRabbit
Tests
Chores